Pivot Selection Methods Based on Covariance and Correlation for Metric-space Indexing
نویسندگان
چکیده
Metric-space indexing is a general method for similarity queries of complex data. The quality of the index tree is a critical factor of the query performance. Bulkloading a metricspace indexing tree can be represented by two recursive steps, pivot selection and data partition, while pivot selection dominants the quality of the index tree. Two heuristics, based on covariance and correlation, for pivot selection are proposed. Empirical results show that their performance is superior or comparable to existing methods. Keywords-similarity query; metric-space indexing; pivot space model; pivot selection;
منابع مشابه
Pivot-based Metric Indexing
The general notion of a metric space encompasses a diverse range of data types and accompanying similarity measures. Hence, metric search plays an important role in a wide range of settings, including multimedia retrieval, data mining, and data integration. With the aim of accelerating metric search, a collection of pivotbased indexing techniques for metric data has been proposed, which reduces...
متن کاملOptimal Pivot Selection Method Based on the Partition and the Pruning Effect for Metric Space Indexes
This paper proposes a new method to reduce the cost of nearest neighbor searches in metric spaces. Many similarity search indexes recursively divide a region into subregions by using pivots, and construct a tree-structured index. Most of recently developed indexes focus on pruning objects and do not pay much attention to the tree balancing. As a result, indexes having imbalanced tree-structure ...
متن کاملMargin-Based Pivot Selection for Similarity Search Indexes
When developing an index for a similarity search in metric spaces, how to divide the space for effective search pruning is a fundamental issue. We present Maximal Metric Margin Partitioning (MMMP), a partitioning scheme for similarity search indexes. MMMP divides the data based on its distribution pattern, especially for the boundaries of clusters. A partitioning boundary created by MMMP is lik...
متن کاملReduction of Distance Computations in Selection of Pivot Elements for Balanced GHT Structure
For objects in general metric spaces, the generalized hyperplane indexing is one of the most widely used indexing techniques In the paper, some methods are presented to improve the quality of the partitioning in generalized hyperplane tree structure from the viewpoint of balancing factor. The proposed method to represent the elements in the target domain metric space is the usage of a distance ...
متن کاملEfficient Document Indexing Using Pivot Tree
We present a novel method for efficiently searching top-k neighbors for documents represented in high dimensional space of terms based on the cosine similarity. Mostly, documents are stored as bagof-words tf-idf representation. One of the most used ways of computing similarity between a pair of documents is cosine similarity between the vector representations, but cosine similarity is not a met...
متن کامل